Clustering in Complex Directed Networks 

Giorgio FagiolcE 



Sant'Anna School of Advanced Studies, 
Laboratory of Economics and Management, 
Piazza Martiri della Libertd 33, 1-56127 Pisa, Italy. 
(Dated: December 2006) 

Abstract 

Many empirical networks display an inherent tendency to cluster, i.e. to form circles of connected 
nodes. This feature is typically measured by the clustering coefficient (CC). The CC, originally 
introduced for binary, undirected graphs, has been recently generalized to weighted, undirected 
networks. Here we extend the CC to the case of (binary and weighted) directed networks and we 
compute its expected value for random graphs. We distinguish between CCs that count all directed 
triangles in the graph (independently of the direction of their edges) and CCs that only consider 
particular types of directed triangles (e.g., cycles). The main concepts are illustrated by employing 
empirical data on world-trade flows. 
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Networked structures emerge almost ubiquitously in complex systems. Examples include 
the Internet and the WWW, airline connections, scientific collaborations and citations, trade 
and labor-market contacts, friendship and other social relationships, business relations and 
R&S partnerships, cellular, ecological and neural networks [2, 0, IS] • 

The majority of such "real-world" networks have been shown to display structural prop- 
erties that are neither those of a random graph Q , nor those of regular lattices. For example, 
many empirically-observed networks are small- worlds 0, @] . These networks are simultane- 
ously characterized by two features [3]. First, as happens for random graphs, their diameter 
[3l| increases only logarithmically with the number of nodes. This means that, even if the 
network is very large, any two seemingly unrelated nodes can reach each other in a few 
steps. Second, as happens in lattices, small-world networks are highly clustered, i.e. any 
two neighbors of a given node have a probability of being themselves neighbors which is 
much larger than in random graphs. 

Network clustering is a well-known concept in sociology, where notions such as "cliques" 
and "transitive triads" have been widely employed 0, 10]. For example, friendship networks 



are typically highly clustered (i.e. they display high cliquishness) because any two friends 
of a person are very likely to be friends. 

The tendency of a network to form tightly connected neighborhoods (more than in the 
random uncorrelated case) can be measured by the clustering coefficient (CC), see [TO] and 
[l~2| . The idea is very simple. Consider a binary, undirected network (BUN) described by the 
graph G = (N, A), where N is the number of the nodes and A = {a^} is the NxN adjacency 
matrix, whose generic element = 1 if and only if there is an edge connecting nodes i and 
j (i.e. if they are neighbors) and zero otherwise. Since the network is undirected, A is 
symmetric 32|. For any given node i, let di be its degree, i.e. the number of z's neighbors. 



The extent to which z's neighborhood is clustered can be measured by the percentage of 
pairs of z's neighbors that are themselves neighbors, i.e. by the ratio between the number of 
triangles in the graph G with i as one vertex (labeled as U) and the number of all possible 
triangles that i could have formed (that is, Tj = di(di — l)/2) [33(] . It is easy to see that the 
CC for node i in this case reads: 

A} l^-i) d^-iy [L) 

where (A 3 )u is the z-th element of the main diagonal of A 3 = A ■ A ■ A. Each product 
aijdihdjh is meant to count whether a triangle exists or not around z. Notice that the order 
of subscripts is irrelevant, as all entries in A are symmetric. Of course, Cj G [0, 1]. The 
overall (network-wide) CC for the graph G is then obtained by averaging Cj over the N 
nodes, i.e. C = iV -1 Y2f=i m the case of a random graph where each link is in place 
with probability p G (0, 1), one has that E[C] — p (E stands for the expectation operator). 

Binary networks treat all edges present in G as they were completely homogeneous. More 
recently, scholars have become increasingly aware of the fact that real networks exhibit a 



relevant heterogeneity in the capacity and intensity of their connections [13, [14[, [15|, [ly, [17, 



181 ]. Allowing for this heterogeneity might be crucial to better understand the architecture 
of complex networks. In order to incorporate such previously neglected feature, each edge ij 
present in G (i.e. such that = 1) is assigned a value Wij > proportional to the weight of 
that link in the network. For example, weights can account for the amount of trade volumes 
exchanged between countries (as a fraction of their gross domestic product), the number of 
passengers travelling between airports, the traffic between two Internet nodes, the number 



2 



of e-mails exchanged between pairs of individuals, etc.. Without loss of generality, we can 
suppose that G [0, 1] [HI]. A weighted undirected network (WUN) is thus characterized 
by its iV x iV symmetric weight matrix W = {wij}, where wu = 0, all i. Many network 
measures developed for BUNs have a direct counterpart in WUNs. For example, the concept 
of node degree can be replaced by that of node strength (l3| : 



E 



w 



(2) 



For more complicated measures, however, extensions to WUNs are not straightforward. 
To generalize the CC of node % to WUNs, one has indeed to take into account the weight 
associated to edges in the neighborhood of i. There are many ways to do that For 
example, suppose that a triangle ihj is in place. One might then consider only weights of 
the edges ih and ij [13]. Alternatively, one might employ the weights of all the edges in the 
triangle. In turn, the total contribution of a triangle can be defined as the geometric mean 
of its weights 20] or simply as the product among them [2l|, 22, 23, 24]. In what follows, 
we will focus on the extension of the CC to WUNs originally introduced in 2p| : 



Ci(W) 



h^(i,j) W ij W ih W jh 



\di(di - 1) 



di(di - 1) 



(3) 



where we define = {w^j}, i.e. the matrix obtained from W by taking the k-th root 

of each entry. As discussed in [l9|, the measure ranges in [0, 1] and reduces to Cj when 
weights become binary. Furthermore, it takes into account weights of all edges in a triangle 
(but does not consider weights not participating in any triangle) and is invariant to weight 
permutation for one triangle. Notice that Cj = 1 only if the neighborhood of i actually 
contains all possible triangles that can be formed and each edge participating in these 
triangles has unit (maximum) weight. Again, one can define the overall clustering coefficient 
for WUNs as C = iV" 1 J2?=i Q. 

In this paper we discuss extensions of the CC for BUNs and WUNs (eqs. [T] and [3]) to the 
case of directed networks. It is well-known that many real-world complex networks involve 
non- mutual relationships, which imply no n-sy mmetric adjacency or weight matrices. For 
instance, trade volumes between countries [25l 26 . 27j are implicitly directional relations, as 
the export from country i to country j is typically different from the export from country j 
to country i (i.e. imports of i from j). If such networks are symmetrized (e.g., by averaging 
imports and exports of country i), one could possibly underestimate important aspects of 
their network architecture. 

Alternative extensions of the CC to weighted or directed networks have been recently 
introduced in the literature on "network motifs" j35|. As mentioned, 2(J generalizes the 
CC to weighted - and possibly directed - networks. Similarly, j28| compute the recurrence 
of all types of three-node connected subgraphs in a variety of real-world binary directed 
networks from biochemistry, neurobiology, ecology and engineering. However, the weighted 
CC in l20| does not explicitly discriminate between different directed triangles (cf. Figured]), 
while [28] do not allow for a weighted analysis. This work attempts to bridge the two latter 
approaches and pres ents a unifying framework where, in addition to the measures already 



20l . |28| , one is able to: (i) explicitly account for directed and weighted links; 

i.e., 



discussed in 

and (ii) define a weighted, directed version of the CC for any type of triangle pattern 
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three-node connected subgraph). To compute such coefficients, we shall employ the actual 
and potential number of directed-triangle patterns of any given type. 



Preliminaries. In directed networks, edges are oriented and neighboring relations are not 
necessarily symmetric. In the case of binary directed networks (BDNs), we define the in- 
degree of node i as the number of edges pointing towards i (i.e., inward edges). The out- 
degree of node i is accordingly defined as the number of edges originating from % (i.e., outward 
edges). Formally: 

df = Y i a ji = {A T ) i l (4) 
dT d = Y ; <h j = (A) i l, (5) 

where A T is the transpose of A, (A)i stands for the z-th row of A, and 1 is the ^-dimensional 
column vector (1,1,..., 1) T . The total-degree of a node i simply the sum of its in- and out- 
degree: 

d\ ot = df + d° ut = (A T + A)il. (6) 

Finally, provided that no self interactions are present, the number of bilateral edges between 
i and its neighbors (i.e. the number of nodes j for which both an edge % — > j and an edge 
j — > i exist) is computed as: 

dV = J2 a iJ a ji = A l- ( 7 ) 

It is easy to see that in BUNs one has: di = d l ° l — df. 

The above measures can be easily extended to weighted directed networks (WDNs), by 
considering in-, out- and total-strength (see eq. |2J). 

Binary Directed Networks. We begin by introducing the most general extension of the CC 
to BDNs, which considers all possible directed triangles formed by each node, no matter 
the directions of their edges. Consider node i. When edges are directed, i can generate up 



to 8 different triangles with any pair of neighbors [36]. Any product of the form aijdihCLjh 
captures one particular triangle, see Fig. [I] for an illustration. 

The CC for node i {Cf) in BDNs can be thus defined (like in BUNs) as the ratio between 
all directed triangles actually formed by i (tf) and the number of all possible triangles that 
i could form (T t D ). Therefore: 

C? (A) = £ = 

i 

\ Ej Eft (og + aji)(aih + ahi)(ajh + ay) 

[d tot {d tot_ l) _ 2dr] - U 

(A + A T )j 
2[d**(d$*-l)-2d?Y 
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FIG. 1: Binary directed graphs. All 8 different triangles with node i as one vertex. Within each 
triangle is reported the product of the form that works as indicator of that triangle in 

the network. 

where (also in what follows) sums span over j ^ i and h 7^ In the first line of eq. 

(jHJ), the numerator of the fraction is equal to tf, as it simply counts all possible products of 
the form 0^-0^0^ (cf. Fig. CD). To see that Tf = d*°'(<i* * — 1) — 2c?", notice that i can be 
possibly linked to a maximum of d\ ot (dj ot — l)/2 pairs of neighbors and with each pair can 
form up to 2 triangles (as the edge between them can be oriented in two ways). This leaves 
us with d t ° t {d t ° t — 1) triangles. However, this number also counts "false" triangles formed 
by i and by a pair of directed edges pointing to the same node, e.g. i —> j and j — ► i. There 
are d^* of such occurrences for node i, and for each of them we have wrongly counted two 
"false" triangles. Therefore, by subtracting 2d^ from the above number we get Tf. This 
implies that Cf G [0, 1]. The overall CC for BDNs is denned as C D = N' 1 J2? =1 Cf. 

The CC in eq. (jHJ) has two nice properties. First, if A is symmetric, then Of (A) = d(A), 
i.e. it reduces to ([!]) when networks are undirected. To see this, note that if A is symmetric 
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then cf*°* = 2di and d? = di. Hence: 



C D (A) = ( 2A )a 

1 v ; 2[2di(2di - 1) - 2di\ 



[A) « - = C(A) (9) 



di(di - 1) 

Second, the expected value of Cf in random graphs, where each edge is independently 
in place with probability p e (0, 1) (i.e. are i.i.d. Bernoulli(p) random variables), is still 
p (as happens for BUNs). Indeed, the expected value of tf is simply 4(iV — 1)(N — 2)p 3 . 
Furthermore, note that df ~ d° ut ~ BIN(N - l,p) and dj ot ~ BIN(2(N - l),p). Hence 
E[dl ot (dj ot - 1)] = E[dj ot } 2 - E[dl ot ] = 2(N - 1)(2N - 3)p 2 . Similarly, E[d?\ = (N - l)p 2 , 
which implies that E[TP] = A(N - l)(N - 2)p 2 and finally that E[CP] = p. 

Weighted Directed Networks. The CC for BDNs defined above can be easily extended to 
weighted graphs by replacing the number of directed triangles actually formed by i {tf ) 
with its weighted counterpart tf . Given eq. dSJ), tf can be thus computed by substituting 

A with Ww. Hence: 



t D p + (W T ). 
' 1 1 TP 2{d t °\d t ° t - 1) - 2d?] ' { ' 

Note that when the graph is binary (W = A), then (W)^\ = W = A. Hence, Of (A) = 
Cf(A). Moreover, if W is a symmetric weight matrix, then the numerator of Cf{W) 

becomes By combining this result with the denominator in eq. (Q, one has that 



Cf(W) = Ci(W) for any symmetric^ [37 . 

To compute expected values of Cf in random graphs, suppose that weights are drawn 
using the following two-step algorithm. First, assume that any directed edge i — ^ j is in 
place with probability p (independently across all possible directed edges). Second, let the 
weight Wij of any existing directed edge (i.e., in place after the first step) be drawn from 
an independent random variable uniformly distributed over (0,1] [38[. In this case, one has 
that E[wij]s — 32 It easily follows that for this class of random weighted graphs: 

E[C?] = E[C i ]=(^jp<p. (11) 
The overall CC for WDN is again defined as C D = N' 1 J2? =1 Cf. 

Clustering and Patterns of Directed Triangles. The CCs for BDNs and WDNs defined above 
treat all possible directed triangles as they were the same, i.e. if directions were irrelevant. 
In other words, both C D and C D operate a symmetrization of the underlying directed graph 
in such a way that the original asymmetric adjacency (resp. weight) matrix A (resp. W) 

is replaced by the symmetric matrix A + A T (resp. wM + (iy T )M). This means that 
in the transformed graph, all directed edges are now bilateral. Furthermore, in binary 
(respectively, weighted) graphs, edges that were already bilateral count as two (respectively, 
receive a weight equal to the sum of the weights of the two directed edges raised to 1/3). 
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However, in directed graphs triangles with edges pointing in different directions have a 
completely different interpretation in terms of the resulting flow pattern. Put it differently, 
they account for different network motifs. Looking again at Figure [TJ it is possible to single 
out four patterns of directed triangles from z's perspective. These are: (i) cycle, when there 
exists a cyclical relation among i and any two of its neighbors {% — > j — > h — > i, or viceversa); 
(ii) middleman, when one of z's neighbors (say j) both holds an outward edge to a third 
neighbor (say h) and uses % as a medium to reach h in two steps [39| ; (iii) in, where i holds 
two inward edges; and (iv) out, where i holds two outward edges. 

When one is interested in measuring clustering in directed networks, it is important to 
separately account for each of the above patterns. This can be done by building a CC 
for each pattern (in both BDNs and WDNs). As usual, each CC is defined as the ratio 
between the number of triangles of that pattern actually formed by i and the total number 
of triangles of that pattern that i can possibly form. Each CC will then convey information 
about clustering of each different pattern within tightly connected directed neighborhoods. 
In order to do that, we recall that the maximum number of all possible directed triangles 
that i can form (irrespective of their pattern) can be decomposed as: 
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If + T 2 D 


+ T 3 D + T 4 D . 



Let {T[ yc , T™ ld , T- n , T° ut } the maximum number of cycles, middlemen, ins and out that i 
can form. Inspection suggests that: T[ yc = Tf, T™ id = T 2 D , T\ n = T 3 D and T° ut = T 4 D . 
To see why, consider for example T[ yc . In that pattern type (see Figure [TJ top panels), 
node i is characterized by one inward link and one outward link. The maximum number 
of such patterns is given by d\ n d° ut . Again, this also counts "false" triangles, formed by i 
and by a pair of directed edges pointing to and from a same node j. Therefore, one has 
to subtract to get T^ yc . Incidentally, notice that T^ yc = T™ %d . The reason why this is 
indeed the case becomes evident when one compares the top and the bottom pairs of triangle 
patterns in Figure [TJ Indeed, cycles and middlemen only differ from the orientation of the 
link connecting the partners of the reference node {%), which does not affect the maximum 
number of triangles that i can form. 

In order to count all actual triangles formed by i, we notice that: 



tf = (A + A T )u = 

= (A% + (AA T A)a + (13) 
+ (A T A% + (A*A T ) U = 



tf+ti + tf + tf. 



By letting {t cyc , t™" %d , tf 1 , t° ut } the actual number of cycles, middlemen, ins and outs 
formed by i, simple algebra reveals that t cyc = tf, t™ xd = tf , tf 1 = tf and t° ut = tf. 
For example: 
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2 [ a *i a i^ a ^ + a ih^hjaji] = ( 14 ) 

j h 

l ~[A {i) AA® + AlA T {A T n = A^AA® = A% iy 



Similarly: 



- [flytt/y-afti + a^o^o^] — (15) 

= \[A T (l) A{A T )^ + = A(i)A T A® = (AA T A) (lt) . 

Notice that although T^ yc = T™ ld , now 7^ t™ d as long as A is asymmetric. 
Summing up, we can define a CC for each pattern as follows: 

C! = |r> (16) 

i 

where {*} = {eye, mid, in, out}. 

In the case of weighted networks, it is straightforward to replace t* with its weighted coun- 
terpart t*, where the adjacency matrix A has been replaced by We then accordingly 
define: 

C* = |r, (17) 

i 

where {*} = {eye, mid, in, out}. To summarize the above discussion, we report in Tabled a 
taxonomy of all possible triangles with related measures for BDNs and WDNs. 

Two remarks on equations (fl6|) and (FlTI) are in order. First, note that, for {*} = 
{eye, mid, in, out}: (i) when A is symmetric, C* = Ci, (ii) when W is binary, C* = C* ; (iii) 
when W is symmetric, C* = C{. Second, in random graphs one still has that E[C*} = p and 

E[cn = © V 

Finally, network-wide clustering coefficients C* and C* can be built for any triangle 
pattern {eye, mid, in, out} by averaging individual coefficients over the iV nodes. 

These aggregate coefficients can be employed to compare the relevance of, say, cycle-like 
clustering among different networks, but not to assess the relative importance of cycle-like 
and middlemen-like clustering within a single network. In order to perform within-network 
comparisons, one can instead compute the fraction of all triangles that belong to the pattern 
{*} G {eye, mid, in, out} in z's neighborhood, that is: 

# = = (18) 
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and then averaging them out over all nodes. Since for {*} e {eye, mid, in, out} we have that 
Yl*ft = 1 an d J2* ft = 1) tli e above coefficients can be used to measure the contribution 
of each single pattern to the overall clustering coefficient. Notice that, in the case of BDNs, 
/* coefficients simply recover the recurrence of each pattern in the network, as computed in 
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Empirical Application. The above concepts can be meaningfully illustrated in the case of the 
empirical network describing world trade among countries (i.e., the "world trade network", 
WTN in what follows). Source data is provided by [29] and records, for any given year, 
imports and exports from/to a large sample of countries (all figures are expressed in current 
U.S. dollars). Here, for the sake of exposition, we focus on year 2000 only [40]. We choose 
to build an edge between any two countries in the WTN if there is a non-zero trade between 
them and we assume that edge directions follow the flow of commodities. Let be z's 
exports to country j and rriji imports of j from i. In principle, Unfortunately, 
due to measurement problems, this is not the case in the database. In order to minimize 
this problem, we will focus here on "adjusted exports" defined as = (scy +ra^)/2 and we 
build a directed edge from country i to country j if and only if country z's adjusted exports 
to country j are positive. Thus, the generic entry of the adjacency matrix is equal to 
one if and only if > (and zero otherwise). Notice that, in general, ^ e^. In order 
to weight edges, adjusted exports can be tentatively employed. However, exporting levels 
are trivially correlated with the "size" of exporting/ importing countries, as measured e.g. 
by their gross domestic products (GDPs). To avoid such a problem, we firstly assign each 
existing edge a weight equal to Wij = eij/GDPi, where GDPi is country z's GDP expressed 
in 2000 U.S. dollars. Secondly, we define the actual weight matrix as: 

W = { Wij } = /V V ( 19 ) 

to have weights in the unit interval. Each entry Wij tells us the extent to which country z (as 
a seller) depends on j (as a buyer). The out-strength of country i (i.e. z's exports-to-GDP 
ratio) will then measure how z (as a seller) depends on the rest of the world (as a buyer). 
Similarly, in-strengths denote how dependent is the rest of the world on i (as a buyer) [411 ] . 

The resulting WTN comprises N = 187 nodes / countries and 20105 directed edges. The 
density is therefore very high (5 = 0.5780). As expected, the binary WTN is substantially 
symmetric: there is a 0.9978 correlation [42f between in- and out-degree (see Figure [2]) and 
the (non-scaled) S'-measure introduced in [30J] is close to zero (0.00397), indicating that the 
underlying binary graph is almost undirected. 




FIG. 2: WTN: In- vs. out-degree in the binary case. 
Thus, in the binary case, there seems to be no value added in performing a directed 
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analysis: since A is almost symmetric, we should not detect any significant differences 
among clustering measures for our four directed triangle patterns. Indeed, we find that 



C D = 0.8125, while C cyc = 0.8123, C mid = 0.8127, C in = 0.8142, C out = 0.8108 [43j. The 
fact that C D > 5 also indicates that the binary (directed) WTN is more clustered than it 
would be if it were random (with density 5 = 0.5780). Finally, Figure [3] shows that individual 
CCs (0°) are negatively correlated with total degree (rf* ot ), the correlation coefficient being 
-0.4102. This implies that countries with few (respectively, many) partners tend to form 
very (respectively, poorly) connected clusters of trade relationships. 




FIG. 3: WTN: Log-log plot of overall directed clustering coefficient vs. total-degree in the binary 
case. 



The binary network does not take into account the heterogeneity of export flows carried 
by edges in the WTN. Indeed, when one performs a WDN analysis on the WTN, the picture 
changes completely. To begin with, note that weights Wij are on average very weak (0.0009) 
but quite heterogeneous (weight standard deviation is 0.0073). In fact, weight distribution 
is very skewed and displays a characteristic power-law shape (see Figure HJ) with a slope 
around -2. 
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FIG. 4: WTN: Log-log plot of the weight distribution. 
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The matrix W is now weakly asymmetric. As Figure [5] shows, in- and out-strengths 
are almost not correlated: the correlation coefficient is 0.09 (not significantly different from 
zero). Nevertheless, the (not-scaled) S'-measure is still very low (0.1118), suggesting that an 
undirected analysis would still be appropriate. We will see now that, even in this borderline 
case, a weighted directed analysis of CCs provides a picture which is much more precise 
than (and sometimes at odds with) that emerging in the binary case. 
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FIG. 5: WTN: Log-log plot of in-strength vs. out-strength. 



First, unlike in the binary case, the overall average CC (C D ) is now very low (0.0007) 
and significantly smaller than its expected value (0.2438) in random graphs (with the same 
density 5 = 0.5780, but independently, uniformly-distributed weights). Notice, however, 
that C D is almost equal to its expected value in directed graphs characterized by the same 
topology (as defined by the adjacency matrix A) but the same weight distribution (as defined 
by the non-zero elements in W), which turns out to be equal to 0.0005 (with a standard 
deviation of 0.0001) 0. 

Second, Cf is now positively correlated with total strength (the correlation coefficient is 
0.6421), cf. Figure [6j This means that, when weight heterogeneity is taken into account, the 
implication we have drawn in the binary case is reversed: countries that are more strongly 
connected tend to form more strongly connected trade circles. Indeed, Cf exhibits an almost 
null correlation with total degree, see Figure [TJ 

Third, despite the weighted network is only weakly asymmetric, there is a substantial 
difference in the way clustering is coupled with exports and imports. Cf is almost uncorre- 
cted with in-strength (Figured]), while a positive slope is still in place when Cf is plotted 
against out-strength. Hence, the low clustering level of weakly connected countries seems 
to depend mainly on their weakly exporting relationships. 

Fourth, weighted CC coefficients associated to different triangle patterns now show a 
relevant heterogeneity: C* range from 0.0004 (cycles) to 0.0013 (out). In addition, cycles 
only account for 18% of all triangles, while the other three patterns account for about 27% 
each. Therefore, countries tend to form less frequently trade cycles, possibly because they 
involve economic redundancies. 

Finally, CCs for different triangle patterns correlate with strength measures in different 
ways. While C^ yc , C™" %d and C\ n are positive and strongly correlated with total strength, 
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FIG. 6: WTN: Log-log plot of overall CC vs. total strength in the WDN case. 
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FIG. 7: WTN: Log-log plot of overall CC vs. total degree in the WDN case. 



Qout j g nQ ^ gee pig Ure p countries tend to maintain exporting relationships with connected 
pairs of partners independently of the total strength of their trade circles. 



Concluding remarks. In this paper, we have extended the clustering coefficient (CC), orig- 
inally proposed for binary and weighted undirected graphs, to directed networks. We have 
introduced different versions of the CC for both binary and weighted networks. These coeffi- 
cients count the number of triangles in the neighborhood of any node independently of their 
actual pattern of directed edges. In order to take edge directionality fully into account, we 
have defined specific CCs for each particular directed triangle pattern (cycles, middlemen, 
ins and outs). For any CC, we have also provided its expected value in random graphs. Fi- 
nally, we have illustrated the use of directed CCs by employing world trade network (WTN) 
data. Our exercises show that directed CCs can describe the main clustering features of the 
underlying WTN's architecture much better than their undirected counterparts. 
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TABLE I: A taxonomy of the patterns of directed triangles and their associated clustering coefficients. For each pattern, we show the 
graph associated to it, the expression that counts how many triangles of that pattern are actually present in the neighborhood of i (t*), the 
maximum number of such triangles that i can form (T*), for * = {eye, mid, in, out, D}, and the associated clustering coefficients for BDNs 

and WDNs. Note. In the last column: W = W^] = {wfA. 
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FIG. 8: WTN: Log-Log plot of overall CC vs. in-strength in the WDN case. 
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FIG. 9: WTN: Log-log plot of C° ut vs. total strength in the WDN case. 
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If some Wij > 1, one can divide all weights by maxij{wij}. 
That is, sets of topologically-equivalent subgraphs of a network. 

Of course, by a symmetry argument, they actually reduce to 4 different distinct patterns (e.g. 
those in the first column). We will keep the classification in 8 types for the sake of exposition. 
[37] The CC in (fTUj) is similar to that presented by but takes explicitly into account edge 
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directionality in computing the maximum number of directed triangles (Tf). Conversely, 
0,0 set TP = di(di - 1). 

[38] That is, is a random variable equal to zero with probability 1 — p and equal to a U(0, 1] 
with probability p. Of course this admittedly naive assumption is made for mathematical 
convenience to benchmark our results in a setup where one is completely ignorant about 
the true (observed) weight distribution. In empirical applications, one would hardly expect 
observed weights to follow such a trivial distribution and more realistic assumptions should 
be made. For example, expected value of CCs might be computed by bootstrapping (i.e., 
reshuffling) the observed empirical distribution of weights in W across the same topological 
graph structure, as defined by the observed adjacency matrix A. See also below. 

[39] These patterns can be also labeled as "broken" cycles, where the two neighbors whom i 
attempts to build a cycle with, actually invert the direction of the flow. 

[40] That is, the most recent year available in the database. This also allows us to keep our 



discussion similar to that in 



[41] Dividing by GDPj would of course require a complementary analysis. Notice also that 19] 
define adjusted exports as e(i,j) = e(j,i) = [x(i, j)+m(j,i)+x(j,i)+m(i, J)]/2, thus obtaining 
an undirected binary/weighted network by construction. 

[42] Here and in what follows, by correlation (or correlation coefficient) between two vari- 
ables X and Y, we mean the Spearman product-moment sample correlation, defined as 
J2i ( x i — %)(yi —y)/[{N — l)sxSy], where Sx and sy are sample standard deviations. All 
correlation coefficients have been computed on original (linear) data, albeit log-log plots are 
sometime displayed. 

[43] Accordingly, one has that f- yc = 0.2499, f™ id = 0.2501, /| n = 0.2531 and f° ut = 0.2469. 
[44] To compute such expected values, we randomly reshuffled WTN weights in W (by keeping 

A fixed) and computed averages/standard deviations of CCs over M = 10000 independent 

replications. 
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